AITopics | rgb stream

Collaborating Authors

rgb stream

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living

Das, Srijan, Dai, Rui, Yang, Di, Bremond, Francois

arXiv.org Artificial IntelligenceMay-17-2021

Abstract--Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But the cost of computing 3D poses from RGB stream is high in the absence of appropriate sensors. This limits the usage of aforementioned approaches in real-world applications requiring low latency. Then, how to best take advantage of 3D Poses for recognizing ADL? To this end, we propose an extension of a pose driven attention mechanism: Video-Pose Network (VPN), exploring two distinct directions. One is to transfer the Pose knowledge into RGB through a feature-level distillation and the other towards mimicking pose driven attention through an attention-level distillation. Finally, these two approaches are integrated into a single model, we call VPN . We show that VPN is not only effective but also provides a high speed up and high resilience to noisy Poses. VPN, with or without 3D Poses, outperforms the representative baselines on 4 public datasets.

artificial intelligence, distillation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2105.08141

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Texas > Harris County > Spring (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

MARS: Motion-Augmented RGB Stream for Action Recognition - Naver Labs Europe

#artificialintelligenceFeb-29-2020, 05:08:44 GMT

This blog presents our CVPR'19 paper on "MARS: Motion-Augmented RGB Stream for Action Recognition" done with the Thoth team at Inria. The code and trained models are available here. Action recognition in videos means you need to process both spatial and temporal information and, although CNNs have been pretty successful in modeling spatial information, their performance in modeling temporal information has been subpar. Current state-of-the-art techniques use 3D CNN based two stream architectures that are trained on a large dataset and where one stream processes appearance information using RGB frames while the other deals with motion information using optical flow. However, computing optical flows creates a latency for recognizing videos which obviously limits its use in real-time applications.

information, motion-augmented rgb stream, rgb stream, (11 more...)

#artificialintelligence

Country: Europe (0.40)

Genre: Research Report (0.39)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Bilinear Faster RCNN with ELA for Image Tampering Detection

Yancey, Robin Elizabeth

arXiv.org Machine LearningApr-7-2019

With technological advances leading to an increase in mechanisms of image tampering, our fraud detection methods must continue to be upgraded to match their sophistication. One problem with current methods is that they require prior knowledge of the method of forgery in order to determine which features to extract from the image to localize the region of interest. When a machine learning algorithm is used to learn different types tampering from a large set of various image types, with a big enough database we can easily classify which images are tampered (by training on the entire image feature map for each image), but we still are left with the question of which features to train on, and how to localize the manipulation. To solve this, object detection networks such as Faster RCNN, which combine an RPN (Region Proposal Network) with a CNN have recently been adapted to fraud detection by utilizing their ability to propose bounding boxes for objects of interest to localize the tampering artifacts. In this work, an existing bilinear Faster RCNN model that was developed will be modified with the second stream having an input of the ELA (Error Level Analysis) JPEG compression level mask.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Machine Learning

1904.08484

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety (0.55)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback